feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit by schenksj · Pull Request #4588 · apache/datafusion-comet

schenksj · 2026-06-04T01:46:44Z

Closes #4577.

Follow-up to #4531 (deep And/Or chains). Protobuf's recursion limit (100) applies to
any deeply nested BinaryExpr, so a long left-deep chain of other associative operators
overflows the same way when the serialized plan is re-parsed.

What

Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to:

BitwiseAnd / BitwiseOr / BitwiseXor — always integral and exactly associative, so
they reuse the existing createBalancedBinaryExpr directly.
Add / Multiply — gated via isAssociativeAndRebalanceable to integral types in
LEGACY (wrapping, modular) eval mode, the only exactly-associative case. Float isn't
associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY make
integer-overflow position observable and grouping changes it; decimal precision grows
per op. Those keep the existing left-deep serialization. Add/Multiply emit a
MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a new
createBalancedMathExpr builds the balanced tree with the chain's uniform type and
eval mode at every inner node.

Tests

Mirror #4531: project 200-deep chains and assert Comet runs them natively with results
matching Spark (which also verifies the associativity guarantee).

Not addressed (by design)

Deep Add/Multiply chains that are ANSI/TRY, floating-point, or decimal are not
rebalanced — they aren't exactly associative, so regrouping could change the result or which
intermediate overflows. They keep the pre-existing left-deep serialization, so a sufficiently
deep such chain can still hit the protobuf recursion limit on re-parse, exactly as before this
PR. This is a deliberate correctness-over-depth trade-off, and strictly better than the prior
state (which had no rebalancing for any of these operators).

) Follow-up to apache#4531 (deep And/Or chains). Protobuf's recursion limit (100) applies to any deeply nested BinaryExpr, so a long left-deep chain of other associative operators overflows the same way when the serialized plan is re-parsed. Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to: - BitwiseAnd / BitwiseOr / BitwiseXor: always integral and exactly associative, so they reuse the existing createBalancedBinaryExpr directly. - Add / Multiply: gated via isAssociativeAndRebalanceable to integral types in LEGACY (wrapping, modular) eval mode -- the only exactly-associative case. Float isn't associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY make integer overflow position (which the grouping changes) observable; decimal precision grows per op. Those keep the existing left-deep serialization. Add and Multiply emit a MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a new createBalancedMathExpr builds the balanced tree with the chain's uniform type and eval mode at every inner node. Tests mirror apache#4531: project 200-deep chains and assert Comet runs them natively with results matching Spark (which also verifies the associativity guarantee). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…associative-binary # Conflicts: # spark/src/main/scala/org/apache/comet/serde/arithmetic.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

andygrove · 2026-07-02T16:45:01Z

Nice extension of the #4531 rebalancing. I read through the compatibility reasoning carefully and it holds up. A few notes and questions below.

Compatibility

The load-bearing claim is that serializing every inner node of the balanced tree with a single returnType/evalMode is valid, and it is. Whenever Spark's type coercion widens an operand it inserts a Cast, and Cast doesn't match the flattenAssociative predicate, so it terminates flattening. That means every leaf within a maximal chain already shares that chain's type, and the single returnType/evalMode is exactly what the left-deep serialization would have emitted node by node. A mixed-width case like byteCol + byteCol + intCol stops flattening at the Cast(..., Int) boundary, so the inner byte add is serialized separately with its own type.

The integral + LEGACY gate is also correct. Modular add/multiply are associative at every integer width, so regrouping yields the same residue, and the float/ANSI/TRY/decimal exclusions are all justified. I couldn't find a case where the rebalanced tree diverges from Spark.

Questions / suggestions

The gate means deep Add/Multiply chains that are ANSI, float, or decimal are still serialized left-deep and will still overflow the recursion limit on re-parse. That's unavoidable given they aren't safely associative, and it's strictly better than before. Could you confirm those cases degrade gracefully (fall back) rather than crash, and maybe add a one-line note in the description that they remain unfixed by design?
In the deep bitwise test, the col("_1") + lit(i) leaves are Add nodes that take the non-rebalanced serde under default ANSI on Spark 4.0. The values stay small so nothing throws and the bitwise chain is still what's under test, so it's harmless. A short comment noting the leaves are intentionally cheap adds would save a future reader a head-scratch.
Optional: createBalancedMathExpr duplicates most of createBalancedBinaryExpr, differing only in the inner-node builder. The recursive build could be factored into one shared helper if you felt like it, but the current form reads fine and matches the existing style.

Overall this looks correct and tightly scoped. The multiply test is a nice touch since 200 terms genuinely overflow and wrap, exercising the associativity guarantee rather than staying in a safe range.

andygrove

Thanks for following up on this @schenksj

Addresses review feedback on apache#4588: note that the col + lit leaves take the non-rebalanced serde but stay small so nothing throws -- the bitwise chain is what's under test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

schenksj · 2026-07-02T19:46:43Z

Thanks for the careful read, @andygrove — all three addressed:

1. ANSI / float / decimal deep chains. Added a "Not addressed (by design)" section to the description. On the fallback question, to be precise rather than reassuring: these don't degrade gracefully — they retain the exact pre-PR behavior. Serde still succeeds (it just emits the left-deep proto), so there's no plan-time fallback trigger, and Comet doesn't raise prost's default recursion limit on the native side (nor is there a runtime-fallback path), so a sufficiently deep such chain still errors on re-parse (JVM parseFrom / native prost decode) exactly as before #4577. So this PR neither fixes nor regresses them — it's strictly additive for the safely-associative cases. If we wanted those to fall back cleanly instead of erroring, that'd be a separate change (e.g. a depth guard that declines to native), which I'm happy to file as a follow-up if you think it's worth it.

2. Deep-bitwise test leaves. Added a comment noting the col("_1") + lit(i) leaves are intentionally cheap Adds — they take the non-rebalanced serde but stay small so nothing overflows/throws; the deep bitwise chain is what's under test. (commit pushed)

3. Shared build helper. Left as-is per your note that the current form reads fine and matches the existing style — the two builders differ only in the inner-node construction (MathExpr with eval_mode/return_type vs plain BinaryExpr), and keeping each self-contained avoids threading a node-builder callback through a shared recursion. Happy to factor it out if you'd prefer.

schenksj and others added 2 commits June 3, 2026 21:43

Merge branch 'main' into fix/4577-rebalance-associative-binary

6ac9483

schenksj mentioned this pull request Jun 10, 2026

[Tracking] feat(contrib): Native Delta Lake scan via delta-kernel-rs (Iceberg-style contrib) #4366

Draft

Merge remote-tracking branch 'upstream/main' into fix/4577-rebalance-…

3a2fabf

…associative-binary # Conflicts: # spark/src/main/scala/org/apache/comet/serde/arithmetic.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

andygrove approved these changes Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588
schenksj wants to merge 4 commits into
apache:mainfrom
schenksj:fix/4577-rebalance-associative-binary

schenksj commented Jun 4, 2026 •

edited

Loading

Uh oh!

andygrove commented Jul 2, 2026

Uh oh!

andygrove left a comment

Uh oh!

schenksj commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

schenksj commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Tests

Not addressed (by design)

Uh oh!

andygrove commented Jul 2, 2026

Compatibility

Questions / suggestions

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

schenksj commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schenksj commented Jun 4, 2026 •

edited

Loading